Add sqlite reader and adjust SQL queries to work there by georgestagg · Pull Request #182 · posit-dev/ggsql

georgestagg · 2026-03-10T11:57:52Z

I went back and forth many times for this PR on whether to introduce SQL engine specific syntax for percentiles and other incompatibilities. In the end, I decided not to, instead opting to try and use as SQL-agnostic code as I could throughout.

Various things don't work in sqlite, here is the situation as it currently stands as far as I recall:

No EXCLUDE. Instead we keep duplicated columns during the affected queries and drop them from the result.
LIMIT 0 does not return the correct column type information - Switch to LIMIT 1.
No GREATEST or LEAST - Switch to a utility function to build the equivalent with CASE WHEN.
No QUANTILE_CONT or PERCENTILE_CONT - Return back to using an NTILE-based method for boxplot and density.
No ANY_VALUE - We can use MIN, that gives us an any value.
No GENERATE_SERIES - Use a RECURSIVE CTE to generate the series values.

Closes #134

georgestagg · 2026-03-10T12:00:26Z

src/Cargo.toml

-default = ["duckdb", "sqlite", "vegalite", "ipc", "builtin-data"]
+default = ["duckdb", "sqlite", "vegalite", "ipc", "parquet", "builtin-data"]
 ipc = ["polars/ipc"]
 duckdb = ["dep:duckdb", "dep:arrow"]
 polars-sql = ["polars/sql"]
-builtin-data = ["polars/parquet"]
+parquet = ["polars/parquet"]
 postgres = ["dep:postgres"]
 sqlite = ["dep:rusqlite"]
 vegalite = []
 ggplot2 = []
+builtin-data = []


This tweak to features is just prep for Wasm, not related to sqlite.

georgestagg · 2026-03-10T13:22:34Z

Some more context:

I almost added a new SqlDialect trait with generalised methods on it for GREATER/LESSER, GENERATE_SERIES and PERCENTILE_CONT/QUANTILE_CONT. However, I realised that we could use the same (admittedly convoluted) SQL everywhere, even in Snowflake I believe, and so it no longer seemed necessary.

A question arises of if there are performance considerations to reach for the lowest common denominator implementations.

I had Claude whip up a benchmark, and these are the results:

  generate_series                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                      
  ┌──────┬────────┬──────────┬──────────┐                                                                                                                                                                                                             
  │ Size │ Native │ Portable │ Slowdown │                                                                                                                                                                                                             
  ├──────┼────────┼──────────┼──────────┤                   
  │ 64   │ 51 µs  │ 800 µs   │ ~16x     │
  ├──────┼────────┼──────────┼──────────┤
  │ 512  │ 59 µs  │ 1.10 ms  │ ~19x     │
  ├──────┼────────┼──────────┼──────────┤
  │ 1000 │ 65 µs  │ 1.34 ms  │ ~21x     │
  ├──────┼────────┼──────────┼──────────┤
  │ 4096 │ 117 µs │ 3.15 ms  │ ~27x     │
  └──────┴────────┴──────────┴──────────┘

  percentile

  ┌───────┬────────┬──────────┬──────────┐
  │ Rows  │ Native │ Portable │ Slowdown │
  ├───────┼────────┼──────────┼──────────┤
  │ 100   │ 96 µs  │ 495 µs   │ ~5x      │
  ├───────┼────────┼──────────┼──────────┤
  │ 1000  │ 107 µs │ 540 µs   │ ~5x      │
  ├───────┼────────┼──────────┼──────────┤
  │ 10000 │ 397 µs │ 1.01 ms  │ ~2.5x    │
  └───────┴────────┴──────────┴──────────┘

Details

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
use duckdb::Connection;
use ggsql::utils::{sql_generate_series, sql_percentile};

fn setup_connection() -> Connection {
    Connection::open_in_memory().expect("Failed to open DuckDB in-memory connection")
}

fn create_data_table(conn: &Connection, name: &str, n_rows: usize) {
    conn.execute_batch(&format!(
        "CREATE OR REPLACE TABLE {name} AS \
         SELECT random() * 1000.0 AS val \
         FROM GENERATE_SERIES(0, {}) AS seq(n)",
        n_rows - 1
    ))
    .expect("Failed to create data table");
}

fn bench_generate_series(c: &mut Criterion) {
    let conn = setup_connection();
    let mut group = c.benchmark_group("generate_series");

    for n in [64, 512, 1000, 4096] {
        group.bench_with_input(BenchmarkId::new("native", n), &n, |b, &n| {
            let sql = format!(
                "SELECT n FROM GENERATE_SERIES(0, {}) AS seq(n)",
                n - 1
            );
            b.iter(|| {
                let mut stmt = conn.prepare(&sql).unwrap();
                let rows = stmt.query_map([], |row| row.get::<_, f64>(0)).unwrap();
                for r in rows {
                    std::hint::black_box(r.unwrap());
                }
            });
        });

        group.bench_with_input(BenchmarkId::new("portable", n), &n, |b, &n| {
            let cte = sql_generate_series(n);
            let sql = format!("WITH RECURSIVE {cte} SELECT n FROM __ggsql_seq__");
            b.iter(|| {
                let mut stmt = conn.prepare(&sql).unwrap();
                let rows = stmt.query_map([], |row| row.get::<_, f64>(0)).unwrap();
                for r in rows {
                    std::hint::black_box(r.unwrap());
                }
            });
        });
    }

    group.finish();
}

fn bench_percentile(c: &mut Criterion) {
    let conn = setup_connection();
    let mut group = c.benchmark_group("percentile");

    for n_rows in [100, 1000, 10000] {
        let table = format!("data_{n_rows}");
        create_data_table(&conn, &table, n_rows);

        group.bench_with_input(BenchmarkId::new("native", n_rows), &n_rows, |b, _| {
            let sql = format!(
                "SELECT QUANTILE_CONT(val, 0.25) AS q1, QUANTILE_CONT(val, 0.75) AS q3 \
                 FROM {table}"
            );
            b.iter(|| {
                let mut stmt = conn.prepare(&sql).unwrap();
                let row = stmt
                    .query_row([], |row| {
                        Ok((row.get::<_, f64>(0)?, row.get::<_, f64>(1)?))
                    })
                    .unwrap();
                std::hint::black_box(row);
            });
        });

        group.bench_with_input(BenchmarkId::new("portable", n_rows), &n_rows, |b, _| {
            let from = format!("SELECT * FROM {table}");
            let q1 = sql_percentile("val", 0.25, &from, &[]);
            let q3 = sql_percentile("val", 0.75, &from, &[]);
            let sql = format!("SELECT {q1} AS q1, {q3} AS q3");
            b.iter(|| {
                let mut stmt = conn.prepare(&sql).unwrap();
                let row = stmt
                    .query_row([], |row| {
                        Ok((row.get::<_, f64>(0)?, row.get::<_, f64>(1)?))
                    })
                    .unwrap();
                std::hint::black_box(row);
            });
        });
    }

    group.finish();
}

criterion_group!(benches, bench_generate_series, bench_percentile);
criterion_main!(benches);

georgestagg · 2026-03-10T14:39:53Z

Just a quick comment to note b3d6b49 reintroduced SqlDialect as a fight against the less-than-ideal benchmark results.

thomasp85

To the extend of my abilities this looks good. There are a few comments but if they have good answers they shouldn't stop you from merging

src/execute/layer.rs

thomasp85 · 2026-03-12T12:45:08Z

src/plot/layer/geom/histogram.rs

+        // where MAX would be interpreted as the aggregate function
        format!(
-            "(GREATEST(CEIL(({x} - {min} + {w} * 0.5) / {w}) - 1, 0)) * {w} + {min} - {w} * 0.5",
+            "(CASE WHEN CEIL(({x} - {min} + {w} * 0.5) / {w}) - 1 > 0 THEN CEIL(({x} - {min} + {w} * 0.5) / {w}) - 1 ELSE 0 END) * {w} + {min} - {w} * 0.5",


Any reason we are not using the dialect here?

Good spot! I just missed it when adding back SqlDialect.

thomasp85 · 2026-03-12T12:46:24Z

all in all I really like the Dialect approach - I'm sure it will come in handy as we expand our reader support

georgestagg added 7 commits March 10, 2026 10:49

Add sqlite reader

938fffb

Portable GREATEST/LEAST and various sqlite tweaks

24a1b94

Drop remaining __ggsql_stat_ columns not consumed by remappings

349f32a

Use chrono ToSql/FromSql in sqlite reader

4c68435

Add sqlite flags

fd8affb

Use portable SQL

de092b9

Minor tweaks to naming and features

1f2795c

georgestagg requested a review from thomasp85 March 10, 2026 11:57

georgestagg commented Mar 10, 2026

View reviewed changes

Add SQL dialect function overriding for duckdb

b3d6b49

georgestagg added 2 commits March 10, 2026 14:51

cargo fmt

adf591c

Fixup grammar: subquery as function argument

ec946f7

georgestagg mentioned this pull request Mar 11, 2026

WebAssembly Playground demo and interactive Quarto examples #185

Open

thomasp85 added this to the alpha-release milestone Mar 12, 2026

thomasp85 approved these changes Mar 12, 2026

View reviewed changes

georgestagg added 3 commits March 12, 2026 13:53

Merge branch 'main' into sqlite

fd371d8

Use dialect GREATEST() in histogram

4c5f5cd

cargo fmt

8e031b2

georgestagg merged commit 88baccd into main Mar 12, 2026
30 checks passed

georgestagg deleted the sqlite branch March 12, 2026 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sqlite reader and adjust SQL queries to work there#182

Add sqlite reader and adjust SQL queries to work there#182
georgestagg merged 13 commits intomainfrom
sqlite

georgestagg commented Mar 10, 2026 •

edited

Loading

Uh oh!

georgestagg Mar 10, 2026 •

edited

Loading

Uh oh!

georgestagg commented Mar 10, 2026

Uh oh!

georgestagg commented Mar 10, 2026

Uh oh!

thomasp85 left a comment

Uh oh!

Uh oh!

thomasp85 Mar 12, 2026

Uh oh!

georgestagg Mar 12, 2026

Uh oh!

thomasp85 commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

georgestagg commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgestagg Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

georgestagg commented Mar 10, 2026

Uh oh!

georgestagg commented Mar 10, 2026

Uh oh!

thomasp85 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasp85 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

georgestagg Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

thomasp85 commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

georgestagg commented Mar 10, 2026 •

edited

Loading

georgestagg Mar 10, 2026 •

edited

Loading